-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: replace deprecated runloop in fsevents #8304
refactor: replace deprecated runloop in fsevents #8304
Conversation
This is my first time interfacing with C from OCaml and I'm getting the following error in CI:
Some guidance around how I can reproduce this error locally and how I could go about debugging this would be helpful! |
@kevinji That is a very strange error. What is your OS / arch? What kind of file system are you using? |
@Alizter This is the macOS CI build error. I’m having trouble replicating it on a local M1 machine so I’m also asking for some help setting up my dev environment—I’ve run I’ve modified the C bindings to fsevents so I’m wondering if I need to pass a flag for pthread support, or if I messed something up when doing FFI between C and OCaml. |
I wonder if the C optimizer is doing some funky things. What C toolchain versions are you using? |
On my machine both
|
For reference, on my local machine, I get warnings that look like
and failed expect tests like
as well as an error that seems relevant:
|
Just to confirm, the test is OK before this PR? |
I ran EDIT: it only fails with |
Actually it doesn't repro anymore after the latest force-push. |
I pushed a new version that fixes the C logic. The original code freed the dispatch queue fields in the wrong place ( |
This area had some issues with memory safety before (#6151). We were not convinced of the soundness of the bindings so it's not completely unexpected that touching this will reveal issues. |
That's helpful to know. I pushed a new commit with the following small changes:
|
New updates:
|
When does the dispatch queue spawn new threads? |
According to the macOS documentation: "Work submitted to dispatch queues executes on a pool of threads managed by the system." I'm not sure at which specific point the system actually creates the threads though. |
src/fsevents/fsevents_stubs.c
Outdated
} | ||
CAMLdrop; | ||
caml_release_runtime_system(); | ||
caml_c_thread_unregister(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are calling unregister in every single callback invocation. Are you sure this call is cheap enough for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good metric for what you mean here? From what I can tell caml_c_thread_register
creates a thread info block and then attaches it to an existing linked list of thread info blocks, and caml_c_thread_unregister
reverses those changes. However, I'm not sure how expensive that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so concerned about the linked list nor the block, but rather the lock that has to be acquired to do this. Given that this callback in our case isn't doing much work (just notifying which memoization nodes are out of date and hence invalidating the build can easily add up) acquiring this lock can easily affect the latency in watch mode.
I'd be more eager if this PR added some feature in exchange for the worse performance. Are there are any concrete benefits to using dispatch queues that I'm unaware of perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original reason is that FSEventStreamScheduleWithRunLoop
is deprecated with a suggestion to use FSEventStreamSetDispatchQueue
instead. I think in practice, since we're using a serial queue, the background thread running the callback should remain the same, but I couldn't find an easy way to manually manage the threads used for the queue, which could allow us to only run caml_c_thread_register
as needed.
Alternatively, we could use dispatch_get_main_queue
, which uses a serial dispatch queue in the main thread. This would have roughly the same behavior as the original code that use CFRunLoopGetCurrent
assuming there was originally only one thread, but I think we would probably need to run the main dispatch queue somehow with either dispatch_main
or CFRunLoopRun
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative, I've introduced some thread-local storage that keeps track of whether caml_c_thread_register
has been called already by the same thread, and only unregisters once the thread exits. Since a serial dispatch queue often uses the same thread, this should in principle reduce the number of register/unregister calls needed, but additional testing is probably needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a bit better. If you could do some testing to confirm that we aren't spamming caml_c_thread_register/unregister then this should be good to go.
Another alternative that's a little more bulletproof but requires a bit more code is to have the code in the dispatch queue populate some sort of queue with events, and require the OCaml side to poll this queue for events in a loop. This will require some synchronization and the use of CV's to avoid busy polling, but it's probably the best we can do here.
cc @patricoferris who discussed this issue with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgrinberg Sorry just got some time to come back to look at this. I can confirm that running the test/blackbox-tests/test-cases/watching
test that the same background thread is being reused and caml_c_thread_register
is only being run once. Are there some longer workflows that I can test to make sure the behavior is also as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could try running longer builds on more serious projects. But your test seems enough to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have some spare cycles, I would suggest looking at the queue based workaround I suggest previous comments. I think that should guarantee this is well behaved.
Replace the use of the deprecated `FSEventStreamScheduleWithRunLoop()` with `FSEventStreamSetDispatchQueue()`. Since a dispatch queue spawns new threads, we need to call `caml_c_thread_register()` and `caml_c_thread_unregister()` for these threads. This is done indirectly via a `pthread_key_t` so that the functions only need to be called more than once if the dispatch queue switches which thread is running the callback. We replicate the blocking nature of the existing code using a mutex and condition variable. See git/git@b022600 for more details about this approach. Signed-off-by: Kevin Ji <[email protected]>
Replace the use of the deprecated
FSEventStreamScheduleWithRunLoop()
withFSEventStreamSetDispatchQueue()
.Since a dispatch queue spawns new threads, we need to call
caml_c_thread_register()
andcaml_c_thread_unregister()
for these threads. This is done indirectly via apthread_key_t
so that the functions only need to be called more than once if the dispatch queue switches which thread is running the callback.We replicate the blocking nature of the existing code using a mutex and condition variable.
See git/git@b022600 for more details about this approach.
Fixes #7352.